Building a Turkish ASR system with minimal resources
نویسندگان
چکیده
We present an open-vocabulary Turkish news transcription system built with almost no language-specific resources. Our acoustic models are bootstrapped from those of a well trained source language (Italian), without using any Turkish transcribed data. For language modeling, we apply unsupervised word segmentation induced with a state-of-the-art technique (Creutz and Lagus, 2005) and we introduce a novel method to lexicalize suffixes and to recover their surface form in context without need of a morphological analyzer. Encouraging results obtained on a small test set are presented and discussed.
منابع مشابه
Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model
One of the key challenges involved in building statistical automatic speech recognition (ASR) systems is modeling the relationship between subword units or “lexical units” and acoustic feature observations. To model this relationship two types of resources are needed, namely, acoustic resources i.e., speech data with word level transcriptions and lexical resources where each word is transcribed...
متن کاملTowards Turkish ASR: Anatomy of a rule-based Turkish g2p
This paper describes the architecture and implementation of a rule-based grapheme to phoneme converter for Turkish. The system accepts surface form as input, outputs SAMPA mapping of the all parallel pronounciations according to the morphological analysis together with stress positions. The system has been implemented in Python.
متن کاملUsing resources from a closely-related language to develop ASR for a very under-resourced language: a case study for iban
This paper presents our strategies for developing an automatic speech recognition system for Iban, an under-resourced language. We faced several challenges such as no pronunciation dictionary and lack of training material for building acoustic models. To overcome these problems, we proposed approaches which exploit resources from a closely-related language (Malay). We developed a semi-supervise...
متن کاملIncidence of cancer in the Turkish Republic of Northern Cyprus.
BACKGROUND/AIM This study analyzed the incidence, trends, and common types of cancer in the Turkish Republic of Northern Cyprus (TRNC). MATERIALS AND METHODS This study is based on data collected from the office of the North Cyprus Cancer Registry, Ministry of Health, for 2007-2012. Data were arranged on the basis of age group, sex, and cancer site. Age standardized incidence rates (ASRs) wer...
متن کاملThe Green Future: Architecture + Sustainability; Green Architecture and Impacts of it on Urban Planning and Urban Design
Green architecture, or green design, is an approach to building that minimizes harmful effects on human health and the environment. The “green” architect or designer attempts to safeguard air, water, and earth by choosing eco-friendly building materials and construction practices. So, green architecture is Building and structure design philosophy that aims at minimal use of non-renewable and/or...
متن کامل